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Glioblastoma multiforme (GBM) is the most common malignant 
adult brain tumor. Standard GBM treatment includes maximal 
safe surgical resection with combination radiotherapy and 
adjuvant temozolomide (TMZ) chemotherapy. Alarmingly, 
patient survival at five years is below 10%. This is in part due to 
the invasive behavior of the tumor and the resulting inability 
to resect greater than 98% of some tumors. In fact, recurrence 
after such treatment may be inevitable, even in cases where 
gross total resection is achieved. The Cancer Genome 
Atlas (TCGA) research network performed whole genome 
sequencing of GBM tumors and found that GBM recurrence 
is linked to epigenetic mechanisms and pathways. Central to 
these pathways are epigenetic enzymes, which have recently 
emerged as possible new drug targets for multiple cancers, 
including GBM. Here we review GBM treatment, and provide 
a systems approach to identifying epigenetic drivers of GBM 
tumor progression based on temporal modeling of putative 
GBM cells of origin. We also discuss advances in defining 
epigenetic mechanisms controlling GBM initiation and 
recurrence and the drug discovery considerations associated 
with targeting epigenetic enzymes for GBM treatment. 



Surgical and Pharmacological Management of GBM 

In 90% of cases, GBMs arise de novo as primary tumors with- 
out progression from lower grade tumors while secondary GBMs 
originate from previously diagnosed low-grade astrocytomas. 
Maximal safe resection of a primary GBM is the mainstay of 
treatment and confers improved prognosis. Patients who receive 
a surgical resection greater than 98% of the tumor volume have 
a prognosis of 13.1 mo compared with 8.8 mo in patients from 
whom less of the tumor is resected. 1 

Because GBMs have infiltrating cells, the entire tumor can- 
not be removed. For this reason, most GBM patients will fol- 
low a standard treatment regimen after the tumor is resected. 
This consists of 6 weeks of external beam radiation 5 times a 
week plus oral temozolomide daily. Temozolomide (TMZ) is an 
alkylating agent whose therapeutic benefit arises from its abil- 
ity to alkylate/methylate DNA; methylation damages DNA and 
triggers tumor cell death. However, some tumor cells are able to 
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repair this type of DNA damage by expressing an enzyme called 
O-6-methylguanine-DNA methyltransferase (MGMT), thereby 
diminishing the therapeutic efficacy of TMZ. 

Unfortunately, most patients will have a recurrence of 
GBM within 6.9 mo of their original diagnosis. Essentially all 
GBMs recur after initial therapy. Re-operation and re-radiation 
are treatment options for only a minority of patients; 80% or 
more of GBM recurrences occur in the same area as the original 
tumor, precluding additional radiation therapy because of toxic- 
ity concerns. 

Greater knowledge of the cellular, genetic, and epigenetic ori- 
gin of GBM is the key for advancing GBM treatment. Clinical 
researchers are analyzing freshly resected tumors for genetic and 
epigenetic modifications in collaboration with genomics and 
drug discovery groups. These studies are coupled to derivation 
of cell lines from patients' tumors in the hope of identifying and 
developing personalized drug treatment regimens. 

Cell of Origin for Glioblastoma 

A major drug discovery challenge is defining the cellular origin of 
GBM since it is difficult to develop a successful GBM treatment 
without first uncovering the responsible cell type to eliminate. 
Considering that epigenetic modifications, enzymes, and non- 
coding RNAs are often cell type specific makes these cellular ele- 
ments prime targets for identifying the cell-of-origin. However, 
such determinations are often difficult since cells acquiring a 
mutation (cell of mutation) may not be the same as cell of origin 
(ref. 2; Fig. 1). For instance, it is possible that neural stem cells 
pass on mutations to downstream progeny such as oligodendro- 
cyte precursor cells (OPCs), which are putative glioma cells of 
origin. This is the model Liu et al. proposed by labeling differ- 
ent cell populations using mosaic analysis with double markers 
(MADM) in mice. 3 Briefly, glioma was induced by sporadic intro- 
duction of specific Cre-mediated deletion of Neurofibromin-1 
(Nfl) and p53 in neural stem cells, via nestin or GFAP promoter 
mediated expression. After Cre mediated-recombination and 
proliferation of the neural stem cells, the progeny that contained 
homozygous deletion of Nfl and p53 was correlated with green 
fluorescent protein (GFP) expression while wild-type cells were 
labeled red with red fluorescent protein (RFP). By performing 
single cell analysis, the authors determined that OPCs expanded 
or increased upon Nfl and p53 mutation, while neural stem cells 
or other lineages were not overrepresented. Consistent with this 
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Figure 1. Possible cells of origin of glioma. Studies in mouse models have shown that various cell types can give rise to glioma. Neural stem cells 
(NSCs) give rise to other Neural Stem Cells, Astrocytes, Astrocyte-like cells, and neurons. Liu et al. 2011 demonstrated that NSCs give rise to OPCs, 
which can give rise to glioma. 3 Friedmann-Morvinski et al. 2012 demonstrated that astrocytes and neurons can give rise to glioma. 5 Hambardzumyan 
et al. 2011 demonstrated that astrocytes can give rise to glioma after PDGFoverexpression and Ink4a, and ARF deletion. 6 Koso et al. 2012 demonstrated 
that overexpression of a mutagenic Sleeping Beauty (SB) transposon (T2/Onc2) along with a dominant negative p53 in astrocyte like cells can give rise 
to glioma. 4 Chen et al., 2012 demonstrated that NSCs could give rise to glioma after Nf1,p53, and Pfen deletion. 2 Neural stem cells can give to proneu- 
ral, mesenchymal, and neural cell lineages. 



notion, introducing Nfl and p53 mutations directly into OPCs 
in vitro induced gliomagenesis, 3 which is seen clinically in sev- 
eral genetic diseases with predisposition to glioma including 
Li-Fraumeni syndrome (TP53 mutation) and Neurofibromatosis 
type 1. 

By contrast to Liu et al., 3 Koso and colleagues 4 suggested that 
the cell of origin in some GBM is not an OPC but an astroglial- 
like cell. They further postulated that originating mutations can 
occur in NSCs (Fig. 1). They introduced a mutagenic transposon 
in Nestin-Cre mice along with dominant negative p53- (DN) and 
observed 100% glioma formation. Thus, it may be likely that 
multiple cells of origin give rise to glioma. Further, studying the 
genetic and epigenetic landscape of human OPCs and other cells 
as they are differentiating could uncover epigenetic enzymes and 
pathways misregulated in gliomagenesis. 

Recent studies suggest that epigenetic modification determi- 
nation during neural differentiation will likely give insight into 
de-differentiation processes, which may give rise to GBM. Verma 
and colleagues demonstrated that neurons de-differentiate and 
become tumor-initiating cells in mouse models of glioma (ref. 5; 
Fig. 1). They used a modification of the Sleeping Beauty system, 



which concurrently deleted p53 and Nfl using shRNAs targeting 
these transcripts. In this modified system, LoxP sites flank RFP, 
which is deleted using specific Cre expression. Since the presence 
of GFP is constitutive and is not deleted, a GFP/RFP ratio could 
be attained and mosaic analysis could be performed after Cre 
mediated deletion (Fig. 1). Since they used a Cre construct that is 
only expressed in neurons {Synapsin I-Cre), they could follow Cre 
mediated recombination in neurons via loss of RFP. Performing 
this analysis, they demonstrated that mature neurons (NeuN 
and Tujl positive) not expressing RFP, GFAP (astrocyte marker), 
and K.-67 (proliferation marker), induced glioma formation. To 
provide further evidence that neurons can induce glioma, they 
isolated cortical neurons from Synapsin I-Cre mice, transduced 
them with shRNAs targeting p53 and Nfl in vitro and demon- 
strated that they give rise to high-grade glioma when injected 
in immune-compromised mice. Similarly, they used Nestin-Cre 
and GFAP-Cre mice to demonstrate that mutation of neural 
stem cells or astrocytes can produce high-grade glioma in mice 
(Fig. 1). Parallel studies showed that targeting astrocytes pro- 
motes glioma formation. Hambardzumyan et al., 6 overexpressed 
PDGFB in astrocytes derived from Ink4a-ARF~'~ mice (Fig. 1), 
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and observed robust glioma induction in different brain regions. 
Collectively these findings suggest that glioma may arise from 
either de-differentiating neural stem cells or astrocytes. 

Importantly, de-differentiating neurons or astrocytes that 
give rise to glioma might be the very cells that are resistant to 
TMZ therapy and induce tumor recurrence. 7 Parada and col- 
leagues provided evidence that cancer stem cells may be respon- 
sible for GBM recurrence by using a nestin-ATK-IRES-GFP 
mouse model that labels quiescent subventricular adult neural 
stem cells, and breeding this transgenic mouse to a well-estab- 
lished mouse model of gliomagenesis (hGFAP-Cre; Nfl 11 *; p53fl^; 
Pten^'*) (ref. 7; Fig. 1). As expected, TMZ treatment eliminated 
tumors in these mice, but, surprisingly, the cells that returned 
and proliferated were GFP+, suggesting that they were neural 
stem cells. Further, treatment with ganciclovir, which targets 
cells containing a modified version of the herpes simplex thy- 
midine kinase (ATK), eliminated these cells and reduced tumor 
growth in vivo. These exciting studies suggest the presence of 
cancer stem cells in glioma and are supported by concurrent 
studies in papilloma and other cancers. 8 They further suggest 
that understanding the epigenetic changes in cancer stem cells 
after TMZ treatment is key to identifying small molecule inhibi- 
tors of GBM recurrence. 

Genetic and Epigenetic Characteristics 
of Glioblastoma 

The remarkable advances in defining the GBM cell of origin 
have been paralleled by insights into the genetic and epigenetic 
underpinnings of this disease. GBM was the first cancer studied 
by The Cancer Genome Atlas (TCGA; ref. 9) where sequenc- 
ing of over 200 different tumors identified the EGFR, PDGFR, 
PI3K, NF1, TP53, and Rb pathways as misregulated in GBM. 9 
Other studies have uncovered mutations or fusions of other genes 
such as IDH1/IDH2 and FGFR, respectively, in subsets of GBM 
patients. 10 " 12 

However, there are only a few GBM mutational "drivers," 
suggesting that we may have to expand our search for "drivers" 
beyond somatic mutations to understand the genomic networks 
misregulated in GBM. 13 In simple terms, a driver event is usu- 
ally defined as one that occurs early in tumorigenesis and occurs 
in pathways considered critical to the development of any of 
the hallmarks of cancer. 14 " 16 A study in collaboration with The 
Cancer Genome Atlas (TCGA) Research Network proposed four 
subtypes of GBM based on genomic profiling of hundreds of 
human samples. 17 These four subtypes have been named "pro- 
neural," "mesenchymal," "classical," and "neural." Proneural 
GBMs show altered expression of PDGFRA, IDH1, and TP53 
mutation and loss of heterozygosity (LOH) along with PTEN 
mutation and CDKN2A loss. Mesenchymal GBMs have deletion 
of NF1, mutation of TP53 and PTEN, and loss of CDKN2A. 
Classical GBMs are typified by EGFR amplification and lack of 
PTEN, and CDKN2A. Finally, the Neural GBMs show a strong 
expression of neuron markers and genes associated with neuron 
projection and axon and synaptic transmission. These subgroups 
might develop from different cells of origin. Moreover, Verhaak 



et al. 17 discovered that aggressive treatment significantly reduces 
the mortality in Classical and Mesenchymal subgroups but does 
not significantly improve survival in the Neural and Proneural 
groups (P > 0.05). Subtype and MGMT methylation status were 
not significantly correlated, indicating that a patient's response 
can be evaluated independent of MGMT status. 

Supporting evidence for a potential driving event may include 
data showing that the event occurs in a substantial fraction of 
GBM patient samples. For example, Schwatzentruber et al. 15 
demonstrated that somatic mutations in the H3.3-ATRX-DAXX 
chromatin-remodeling pathway frequently occur in pediatric 
GBMs and are associated with alternative lengthening of telo- 
meres and genomic instability. 

Another approach to discovery is to consider the epigenetic 
drivers of gliomagenesis. Several reviews have detailed the histone 
and DNA modifications specific to GBM that can be used to 
expand the current search for "drivers." 18 " 21 For example, Strum 
et al. 22 incorporated the mutational status of H3F3A and IDH1 
with differences in global methylation patterns in GBMs to iden- 
tify 6 distinct epigenetic subgroups, which correlate with distinct 
clinical characteristics. Here we will concentrate on microRNAs 
(approximately 22 nucleotide RNAs) and long non-coding RNAs 
(greater than 200 nucleotide RNAs) that affect gene expres- 
sion through regulation of mRNA stability and transcription 
regulation. MicroRNAs are non-coding RNAs, which bind to 
microRNA response elements (MREs) in target mRNAs. Once 
the miRNA is loaded into the RISC complex (RNA-induced 
silencing complex), the miRNA/RISC complex binds the target 
mRNA, thereby modulating its stability. miRNAs dysregulated 
in glioma include miRlOb, which is expressed in glioma tumors 
and stem cells, but not neuronal progenitors, mature glia, or neu- 
rons. 23 " 26 miRlOb controls GBM cell and stem cell cycle traverse 
and is correlated with poor prognosis (http://tcga-data.nci.nih. 
gov/tcga). The targets and pathways controlled by miRlOb are 
under investigation. However, network analysis of miRlOb tar- 
gets and follow-up studies will be required to fully understand its 
role in GBM cell proliferation. 

Three elegant studies elucidated a microRNA network, 
which regulates RNA-RNA interactions in oncogenic pathways 
controlling GBM. 27 " 29 While one of the studies utilized a mela- 
noma model, all three studies concentrated on an established 
GBM tumor suppressor, PTEN. PTEN is a phosphatase, which 
opposes the PI3K pathway, a known GBM driver. PTEN lev- 
els are controlled by miR-26A, which is amplified in GBM. 27 " 
29 The current thinking is that in addition to well-established 
PTEN somatic mutations, PTEN levels are controlled by miR- 
NAs, which may be misregulated in tumors. What these papers 
demonstrate is that miRNA networks are more complex than we 
had previously anticipated. Specifically, competitive endogenous 
RNAs (ceRNAs) control the amount of each miRNA species 
(Fig. 2). In the case of miR-26A, if there is an excess of mRNAs 
that contain the microRNA recognition elements (MREs) rela- 
tive to PTEN mRNA, PTEN protein levels should rise. By con- 
trast, removing a PTEN mRNA competitor should lower PTEN 
levels since there would be more miRNA to bind PTEN mRNA 
(Fig. 2A). This is what the authors observed: lowering PTEN 
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Figure 2. Competing endogenous RNA levels modulate expression of oncogenes and tumor 
suppressors. (A) The levels of oncogenes can be modulated by an increase in levels in competing 
endogenous RNAs. microRNAs are titrated away from mRNAs encoding oncogenes when the 
levels of competing endogenous RNAs are increased. This leads to increased expression of on- 
cogenes. (B) The levels of tumor suppressor proteins are modulated by decreases in the levels of 
competing endogenous RNAs. microRNAs bind to RNAs of tumor suppressors, thereby reducing 
tumor suppressor protein expression after transformation. 



where competing endogenous mRNA 
rankings are generated using conserved 
mRNA-miRNA interactions. The user 
enters an mRNA for which they are inter- 
ested in finding potential competing 
mRNAs, and the tool returns potential 
ceRNA regulators. This useful database 
allows us to delineate pathways control- 
ling glioma progression based on an under- 
standing of miRNA-ceRNA networks. 

miRNA-ceRNA networks are likely 
further controlled by long non-coding 
RNAs (IncRNAs), which control global 
gene repression. 32 " 35 IncRNAs control 
multiple tumor suppressor proteins and 
oncogenes. 27 " 29,32 " 35 IncRNAs modulate 
transcription, regulate post-transcriptional 
RNA processing, influence translation, 36 
and alter DNA methylation and chromatin 
architecture through local (cis) and long 
distance (trans) mechanisms. 37 Through 
interactions with transcription factors, co- 
activators and/or repressors, IncRNAs can 
affect different aspects of gene transcrip- 
tion to form a fine-tuned complex regula- 
tory network. IncRNAs also modulate gene 
expression by recruiting chromatin remod- 
eling complexes like histone methyltrans- 
ferases to specific genomic loci. 

The various regulatory roles of IncRNAs 
may play a crucial role in GBM develop- 
ment and progression. For example, the 
IncRNA MEG3 has been implicated in 
glioma cell proliferation. 38 Interestingly, 
MEG3 expression is associated with dif- 
ferential methylation. Han et al. 39 investi- 
gated IncRNA expression between GBM 
and normal samples, and discovered several 
IncRNAs implicated in glioma signaling. 
Their findings suggest that two IncRNAs, 
ASLNC22381 and ASLNC20819, which 
target IGF-1 may be important in GBM 
progression and recurrence. Since some 



IncRNAs interact with miRNAs,' 



/ill 



competitors reduced PTEN levels and accelerated tumor growth. 
Similarly, Fang and colleagues 30 found that an increase of versican 
3-UTR produced competition for 4 different miRNAs resulting 
in higher versican, fibronectin, and CD34 mRNA and protein 
levels, thereby leading to cell proliferation and tumor growth 
(Fig. 2B). Interestingly, in order to shed light on these mecha- 
nisms Sarver and Subramanian 31 built an open access database 



be important to further investigate their 
relationship in glioma. Katsushima et al. 41 
investigated miRNAs in a glioma stem cell 
(GSC) model, identifying miR-1275 as 
being associated with GSC differentiation, 
GBM heterogeneity, and tumor cell proliferation. Interestingly, 
miR-1275 expression was shown to be associated with histone 
H3 lysine 27 trimethylation (H3K27me3), and subsequent stud- 
ies have shown a molecular interplay between IncRNAs and 
H3K27me3 in gene silencing. 42 This implicates IncRNAs as 
potential factors in the complex genomic interactions in glioma. 
Unpublished data from our laboratory identified several IncRNAs 
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that are differentially expressed in GBM compared with control 
tissues (Pastori et al., unpublished observations). However, the 
challenge is defining whether IncRNA up or downregulation in 
GBM is a driver or passenger event. We argue here that deter- 
mining this is only possible using systems biology and network 
modeling approaches. 

Epigenetic Enzymes as Therapeutic Targets 
in Glioblastoma 

Determining the miRNAs, ceRNAs, and lincRNAs that con- 
trol levels of epigenetic enzymes will be critical in elucidating 
whether these enzymes are GBM drivers. Epigenetic enzymes 
are gaining considerable attention due to their druggabil- 
ity and overexpression in certain cancers. Our group profiled 
150 epigenetic enzymes in 27 GBM human samples using the 
Nanostring platform (Daniel et al., unpublished observations). 
Several chromatin writers, readers and erasers appear to be dif- 
ferentially expressed in GBM. Consistent with prior reports, 43,44 
we found that EZH2 is upregulated in GBM samples (Daniel et 
al., unpublished observations). Ezh2 is one of the most studied 
epigenetic enzymes as a lysine methyltransferase that modifies 
histones, thereby repressing expression of some tumor suppressor 
proteins including Cdkn2a/b. 45 Ezh2 may also promote cancer 
cell proliferation by modifying non-histone proteins to create 
methyl-degrons recognized by ubiquitin ligases that induce cell 
cycle traverse. 46 Further, Ezh2 interacts with IncRNAs that are 
overexpressed in multiple tumors. 47 " 50 Ezh2 levels are high in 
GBM relative to normal brain tissue and Ezh2 is required for 
GBM stem cell maintenance. 43,44 But, whether Ezh2 is a good 
therapeutic target in GBM is unclear since somatic mutations 
have not been described in GBM." By contrast, Ezh2 mutations 
are present in lymphoma where Ezh2 inhibitor sensitivity is cor- 
related with mutational status. 51,52 Highly specific small-molecule 
Ezh2 inhibitors have been recently described that potently reduce 
lymphoma cell proliferation in vitro and in vivo. 51,52 These inhibi- 
tors are considerably more potent at reducing proliferation of cells 
harboring Ezh2 mutations than those bearing wild type Ezh2. 51 ' 52 
However, GBM cells containing aberrant modulation of Ezh2 
and Ezh2 binding RNAs may be equally sensitive to small mol- 
ecule Ezh2 inhibitors. For instance, Ezh2 mRNA is regulated 
by miRNA-101, a microRNA downregulated in GBM. 53,54 Thus, 
comparing the Ezh2/miRNA-101 network including ceRNAs in 
normal and GBM cells may be a means of validating Ezh2 as a 
therapeutic target in GBM. In addition, whether long non-cod- 
ing and natural antisense transcripts regulate Ezh2 abundance in 
GBM should be determined to help validate Ezh2 as a target in 
GBM (Fig. 2). Similar analyses for other epigenetic enzymes that 
are likely to be regulated by miRNAs, ceRNAs, and IncRNAs 
should be performed in order to gain a global view of their regu- 
lation in GBM. After determining putative epigenetic enzymes 
misregulated in glioma using network analysis, prioritization of 
those enzymes required for GBM stem cell survival will be cru- 
cial. Mouse models of GBM, which delete different driver muta- 
tions, can then be used to help validate the prioritized enzymes as 
drug targets in GBM. 



Drug Discovery Challenges in Glioblastoma 

Once integrative modeling of neuronal differentiation and 
miRNA and IncRNA networks is performed, the next chal- 
lenge is to identify epigenetic targets for GBM treatment and 
subsequently develop therapeutic strategies for drug discovery. 
Ongoing clinical trials are testing HDAC inhibitors for the treat- 
ment of GBM (NCT01378481, NCT00302159). Further, the 
identification of IDH1/2 mutations in GBM suggests that meta- 
bolic pathways may be attractive targets for GBM. 55 However, 
the drug discovery challenges associated with targeting epigen- 
etic enzymes or IncRNA-miRNA-protein interactions in GBM 
are the same as targeting any cell or target in the brain, which 
is protected by the blood brain barrier (BBB). The BBB pro- 
tects the brain by forming a highly selective barrier that blocks 
the entry of large, hydrophilic molecules. The BBB also makes 
delivery of drugs affecting GBM and other neurologic disorders 
challenging. Moreover, drug transporters effectively pump small 
molecules from the brain. These include the multidrug resistance 
protein (MDR) and multidrug resistance-associated protein 
(MRP; ref. 56). The first multidrug resistance protein studied, 
P-glycoprotein, is considered to be one of the most important 
transporters at the BBB, since mice lacking this transporter have 
higher levels of small molecules in their brain. 56 Further, drugs 
such as verapamil and cyclosporine A, which effectively inhibit 
P-glycoprotein activity in vitro, substantially increase brain reten- 
tion of anti-cancer agents such as vincristine. 56 Thus, it may be 
therapeutically advantageous to simultaneously inhibit an epi- 
genetic target and P-glycoprotein. This could in theory be done 
pharmacologically by developing compounds that target both 
classes of proteins. However, this may be difficult and may be 
better achieved by performing network analysis to determine 
that a specific epigenetic enzyme/microRNA/lncRNA pathway 
is responsible for maintaining P-glycoprotein levels, and subse- 
quently identifying small molecules that disrupt that network. 

Epigenetic Systems Biology 
and Regulatory Networks 

Recent attempts to characterize the GBM epigenetic/genetic 
landscape have utilized integrative, systems biology approaches 
that include multiple types of high-throughput data. 18 The types 
of data involved can include genetic mutations, RNA expres- 
sion, as well as methylation and protein expression. This poses 
challenges for statisticians, mathematicians, and bioinformati- 
cians, as well as computational and medical researchers. Recent 
approaches have focused on examining the latent or underlying 
biological pathways in data repositories from glioma studies and 
the correlations or relationships between them. For example, 
Fronza et al. 57 examined the interactions between groups of 
mRNAs and miRNAs in four glioma data sets and identified 
miRNA clusters, which they validated as being associated with 
survival in GBM. Wuchty et al. 58 also examined miRNAs and 
mRNAs in order to discover significant miRNA-mRNA interac- 
tions in GBMs. Kunkle et al. 59 integrated data on genetic vari- 
ants and genes responsive to environmental exposures, along 
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with various networking databases, to identify genes and path- 
ways involved in gene-environment interactions, which may play 
a role in GBM development. The above studies use associations 
between different types of genomic data to infer possible regula- 
tory systems and disease pathways. 

A particularly interesting way to gain therapeutic insights 
for glioma is to examine the epigenetic landscape of neural dif- 
ferentiation. As stated previously, determining the epigenetic 
modifications of neuronal precursors as they are differentiating is 
important if OPCs, neurons and differentiated astrocytes indeed 
give rise to GBM in humans. Increasing evidence suggests that 
epigenetic pathways play a critical role in the regulation of neuro- 
genesis. 60 In order to examine the epigenetics of differentiation, 
we must consider biological processes over time; the modeling 
of such processes may involve methods from time series analysis, 
Markov chain modeling, and dynamic Bayesian networks. 61 

Traditional time series methods allow for the modeling of 
temporal processes where measurements at different time points 
are correlated or dependent. 62 They can be used with real-valued 
continuous data, as well as discrete data, and can be particularly 
useful in cyclical contexts. This is because time series methods 
allow the use of periodic functions to model changes in a variable 
over several, repeating cycles. Of course most time series methods 
were developed for series of some length — from tens to thousands 
of time points — and for series involving a single response vari- 
able. Cell cycle studies, particularly in epigenetics, may involve 
hundreds or thousands of short series, one for each epigenetic or 
genomic variable and each cell cycle. This leads us to methods 
that find clusters of transcripts and describe these clusters. 63 ' 64 
These clusters may correspond to groups of transcripts being 
targeted by a specific miRNA, a pool of miRNAs, or a specific 
ncRNA. Different clusters may also correspond to transcripts 
involved in specific biological processes associated with differen- 
tiation as well as with the GBM transcriptome (Fig. 3). 

Models based on Markov chain methods 62 can view temporal 
processes as discrete systems, where at each time point the sys- 
tem is in one of a number of possible "states." A Markov chain 
considers the movement of a biological system as a set of transi- 
tions among different "states" over time, e.g., the movement of 
a cellular population through different phases of development 
or responses to stimuli. Each possible movement among states is 
assigned a probability expressing the likelihood of that particu- 
lar move. A hidden Markov model assumes that these states are 
"hidden" or not directly observed, and must be inferred from the 
observed data. In order to process epigenetic data of potentially 
high dimensions, these methods may be coupled with dimension 
reduction techniques such as clustering, variable selection and 
shrinkage. 63 ' 65 Dimension reduction assumes that the observed 
high dimensional data in glioma is a complex representation of 
several lower dimensional genomic processes. For example, we 
may measure a large group of transcripts from an oncologic 
pathway such as mTOR or MAPK, but it is plausible that this 
will not provide completely independent signals from each tran- 
script. Instead, we are likely to see dependent or highly correlated 
signals. As such the effective dimension of a pathway could be 
much smaller than the number of transcripts, and dimension 



reduction/variable selection techniques can help us find a rep- 
resentation of the pathway in this smaller, lower dimensional 
space. This enables us to model high dimensional epigenomic 
data using methods that are more effective in lower dimensional 
contexts (Fig. 4). 

Another powerful tool for modeling temporal processes is a 
dynamic graphical model (DGM). 66,67 In fact, a hidden Markov 
model can be thought of as the simplest of DGMs. In brief, 
DGMs are graphical models — probabilistic models of covari- 
ance /dependence among groups of variables — that provide fully 
Bayesian inference for time series data (for those unfamiliar with 
Bayesian statistics we recommend Bernardo et al., 2009). 68 A 
graphical model is defined by a graph (V,E), where K represents 
the vertices and E represents the edges, and a set of properties, 
which determine a family of probability distributions on the vari- 
ables represented in the graph. For example, we may have a set of 
transcripts, which we represent by X p X 2 ,..., X^ N = \V\. These 
transcripts can be represented as vertices in the graph with edges 
between them, and the set of properties for the graph define the 
probability distributions on the transcripts. These distributions 
specify how the transcripts co-vary or depend on one another (or 
not). There are many different types of graphical models; each 
type is defined by what graphs and what graphical properties are 
allowed. Many existing cancer cell cycle models neglect the auto- 
correlation between successive measurements, which has shown 
to lead to an overestimation of the number of cycling variables. 69 
Since the model search for a DGM can involve dependencies 
among transcripts, this type of model can avoid this potential 
pitfall. 

Apart from statistical techniques for epigenomics there are 
techniques for mathematical modeling of biological networks 
that represent a key to modern systems biology. 70 Ideally, we 
would like to develop such models to understand the roles of 
ncRNAs in the regulatory networks that underlie GBM initia- 
tion and progression. A recent review by Lim et al. 71 describes 
how complex regulatory functions of cells may be conceptualized 
as a system of dynamic functions, and, as such, may be modeled 
by combinations of core network motifs. They posit that in terms 
of molecular networks or algorithms, it may be possible to limit 
the space of models for a given cellular regulatory function and 
hence identify and use such models to generate insights into the 
regulatory process. 

What is interesting is that multiple modeling approaches can 
simulate the same biological phenomenon. Researchers have 
focused on identifying the most useful and accurate model. From 
a statistician's perspective this problem of model selection is one 
of prediction and robustness. A useful model will be (1) robust to 
changes in model assumptions and (2) generate accurate predic- 
tions in a specific context. Many models involve tuning param- 
eters for which values must be chosen. In some cases methods 
exist, which provide reasonable estimates for these values, but in 
other cases the estimates are arbitrary. Ideally we would like any 
modeling results to be robust to these arbitrary choices, i.e., small 
changes in the tuning parameters should lead to small changes in 
model outputs. This gives us confidence that an accurate model 
reflects a biological truth, not the luck of the analyst. 
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Figure 3. A bioinformatics and statistical pipeline for identifying epigenetic targets for GBM from transcriptome data. Hypothetical pipeline for 
identifying epigenetic targets in GBM based on differentially expressed pathways in both differentiating neural stem cell and GBM. Left panel: Dif- 
ferentiating neural stem cells are analyzed for changes in RNA transcript levels by performing RNA-sequencing analysis of differentiating cells. RNA 
sequencing yields transcripts expressed overtime. Mapping/alignment of transcripts using human genome is performed using Tophat and quantifica- 
tion of aligned transcripts is then performed using Cufflinks, or similar bioinformatics pipeline. Statistical filtering by t-tests or analysis of variance after 
quantification yields differentially expressed genes. Clustering of genes by patterns is then performed to identifying RNAs that are associated with 
differentiation pathways. Right panel: RNA-sequencing of GBM and control tissue is performed to identify differentially expressed genes using the 
same bioinformatics pipeline utilized in analyzing differentiating neural stem cells. The degree of overlap of those transcripts, which are differentially 
expressed during differentiation, and in GBM is then calculated to identify epigenetic targets in GBM. 
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Figure 4 (See opposite page). Examples of statistical models for neural temporal data. (A) Time series. Each transcript is modeled separately (uni- 
variate) or as part of a group (multivariate). The model uses information from previous time points in modeling future time points, and can capture 
contemporaneous and lagged dependencies among transcripts. (B) Discrete Markov chain model. Each cellular stage is considered a "state" and the 
chain models the probabilities of moving from one "state" to another in a given time step. Depending on the type of Markov model it may or may not 
be possible to move both backward and forward in time, and hence for cells to differentiate as well as dedifferentiate. (C) Bayesian network model. 
If we consider a directed acyclic graph (DAG), then we define a joint probability distribution over cellular states. For each node or state we define a 
probability distribution for transcription in each state, conditional on transcription in previous states. If we consider a dynamic graphical model (DGM), 
then we can model each state with a graphical model, and separately model the movement from state to state across time. In this way transcripts can 
have contemporaneous as well as time-dependent relationships. NSC, neural stem cell; OPC1, oligodendrocyte precursor cell 1; 0PC2, oligodendro- 
cyte precursor cell 2; Olig, oligodendrocyte; GBM, glioblastoma cell. Pa t is the probability that a neural stem cell remains a stem cell from one time 
point to the next. Pa 2 is the probability that a neural stem cell transforms from the current time point to the next time point. Pb 3 is the probability that 
a GBM cell de-differentiates from the current time point to the prior time point. Pb, is the probability that a GBM cell remains a GBM cell from the cur- 
rent time point to the next. Pc 2 is the probability that an oligodendrocyte precursor cell (OPC) transforms from the current time point to the next. Pc, 
is the probability that an OPC remains an OPC from the current time point to the next. Pb 2 is the probability that a GBM cell de-differentiates into an 
OPC from the current time point to the next time point. Pc 4 is the probability that an OPC differentiates into an oligodendrocyte from the current time 
point to the next time point. Pc 3 is the probability that an OPC dedifferentiates into a neural stem cell from the current time point to the next. Pa 3 is the 
probability that a NSC differentiates into an OPC from the current time point to the next. Pd 1 is the probability that an oligodendrocyte will remain an 
oligodendrocyte from the current time point to the next. Pd 2 is the probability that an oligodendrocyte dedifferentiates into an OPC form the current 
time point to the next. Pa,+Pa 2 +Pa 3 = 1; Pb,+Pb 2 +Pb 3 = 1; Pc,+Pc 2 +Pc 3 = 1. 



Historically scientists have emphasized good model fit by 
selecting a model based on how well it explained the observed 
data. However, in contexts where the number of variables, p, is 
much larger than the number of observations, n, this can lead to 
overfitting. The model fits the observed data so well that it is spe- 
cific to that data only, and fails to explain the broader biological 
phenomenon at work. In this scenario, the model cannot predict 
future observations from the same context as the observed data, 
nor can it predict observations from similar contexts. We have 
seen overfitting at work in research involving microarrays as well 
as other genome-wide technologies. 72,73 One way to avoid overfit- 
ting is to focus on how well a given model can predict data from 
similar studies, or data from the same study that is not used in 
building the model. 

The complexity of modeling for systems biology is increasing 
as investigators collect more complex data types. Issues such as 
prediction and robustness will be critical in sifting through this 
data, using modeling approaches to identify and explain poten- 
tial pharmacologic targets in GBM. 

Conclusions 

The medical and scientific community has made remarkable 
advances in defining the cellular, genetic, and epigenetic origin 
of GBM. Many of these advances have come through studying 
neurogenesis and epigenetic regulation. The hope is that these 
advances can be quickly translated into therapeutic benefits for 
patients who are suffering from this incurable and aggressive 
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